Topical Word Trigger Model for Keyphrase Extraction
نویسندگان
چکیده
Keyphrase extraction aims to find representative phrases for a document. Keyphrases are expected to cover main themes of a document. Meanwhile, keyphrases do not necessarily occur frequently in the document, which is known as the vocabulary gap between the words in a document and its keyphrases. In this paper, we propose Topical Word Trigger Model (TWTM) for keyphrase extraction. TWTM assumes the content and keyphrases of a document are talking about the same themes but written in different languages. Under the assumption, keyphrase extraction is modeled as a translation process from document content to keyphrases. Moreover, in order to better cover document themes, TWTM sets trigger probabilities to be topic-specific, and hence the trigger process can be influenced by the document themes. On one hand, TWTM uses latent topics to model document themes and takes the coverage of document themes into consideration; on the other hand, TWTM uses topic-specific word trigger to bridge the vocabulary gap between the words in document and keyphrases. Experiment results on real world dataset reveal that TWTM outperforms existing state-of-the-art methods under various evaluation metrics. TITLE AND ABSTRACT IN CHINESE æ^ÄuÌK >u .?1' Ä ' Ä 3uy© ¥kL5 1⁄2öá"Ä ' A CX© Ì{K"Ó§' ¿Ø1⁄23© ¥aÑy§ùÒ ́¤¢ © ' m / ®õ 0 ̄K" ©JÑ«ÄuÌK >u .£TWTM¤?1' Ä "T .b © SNÚ' ́3^ØÓ ó£ãÓ {K"3ùb e§' Ä Ò± ï Ǒl© ' ÈL§"Ǒ /CX© {K§T . >u VÇÑ ́ÌK' §l >uL§É © ÌK KǑ"¡§T .|^Û1Ì Ké© {K?1ï §l ò© ÌK CXÝÄ3S¶,¡§T .æ^ÌK ' >uïáå © ' xù"3ý¢êâþ ¢ (JL2§T .`u ®k' Ä {" KEYWORDS: keyphrase extraction, latent topic model, word trigger model. KEYWORDS IN CHINESE: ' Ä ,Û1ÌK ., >u ..
منابع مشابه
Topical Word Importance for Fast Keyphrase Extraction
We propose an improvement on a state-of-the-art keyphrase extraction algorithm, Topical PageRank (TPR), incorporating topical information from topic models. While the original algorithm requires a random walk for each topic in the topic model being used, ours is independent of the topic model, computing but a single PageRank for each text regardless of the amount of topics in the model. This in...
متن کاملAutomatic Keyphrase Extraction via Topic Decomposition
Existing graph-based ranking methods for keyphrase extraction compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into multiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph t...
متن کاملTopicRank: Graph-Based Topic Ranking for Keyphrase Extraction
Keyphrase extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper we present TopicRank, a graph-based keyphrase extraction method that relies on a topical representation of the document. Candidate keyphrases are clustered into topics and used as vertices in a complete graph. A graph-based ranking model is applied to assi...
متن کاملReducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming
We introduce a global inference model for keyphrase extraction that reduces overgeneration errors by weighting sets of keyphrase candidates according to their component words. Our model can be applied on top of any supervised or unsupervised word weighting function. Experimental results show a substantial improvement over commonly used word-based ranking approaches.
متن کاملUnsupervised Keyphrase Extraction with Multipartite Graphs
We propose an unsupervised keyphrase extraction model that encodes topical information within a multipartite graph structure. Our model represents keyphrase candidates and topics in a single graph and exploits their mutually reinforcing relationship to improve candidate ranking. We further introduce a novel mechanism to incorporate keyphrase selection preferences into the model. Experiments con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012